gha: add AI-assisted fallback to /backport command#30290
Conversation
When `git cherry-pick` in `pr_details.sh` fails with a conflict, the
type-branch job now hands off to `anthropics/claude-code-action@v1`
running the `.claude/skills/create-backport-branch/SKILL.md` skill.
If the skill resolves the conflicts, it creates an `ai-backport-pr-*`
branch on the bot fork and writes a Markdown report; the workflow then
opens the backport PR (body = skill report verbatim) and tags it with
the `ai-resolved-conflicts` label. If the skill aborts (modify/delete
or architectural unknowns), behaviour matches today: no backport PR,
fallback issue opened, ❌ reaction on the `/backport` comment.
The clean-cherry-pick path is unchanged — when `pr_details` succeeds,
the AI step is skipped and no Anthropic API call is made.
The AI step runs inside `./fork` (reusing the bot-fork checkout from
the existing workflow). `GH_REPO=$TARGET_FULL_REPO` is exported so the
skill's `gh api "repos/{owner}/{repo}/..."` resolves to
redpanda-data/redpanda rather than the bot fork that `origin` points
at. `continue-on-error: true` on `pr_details` plus `always() && ...`
on the failure-path steps keeps the existing failure handling intact
for cases where both paths fail.
Validated end-to-end on redpanda-data/test-migration. See DEVPROD-4091
for the test-migration staging and per-scenario verification.
There was a problem hiding this comment.
Pull request overview
Adds an AI-assisted fallback path to the /backport GitHub Action so that when the initial cherry-pick conflicts, an automated skill attempts to resolve conflicts and, on success, opens a backport PR using the skill’s markdown report as the PR body.
Changes:
- Make the cherry-pick/details step
continue-on-errorand add an AI fallback step that runs only on cherry-pick failure. - Load the AI skill’s report + inferred backport branch into the environment and conditionally create a PR from either the normal or AI path.
- Update PR creation script to optionally use
--body-filewhen an AI report is present, and label AI-assisted PRs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| .github/workflows/scripts/backport-command/create_pr.sh | Uses an AI-generated markdown report as the PR body when available (--body-file). |
| .github/workflows/backport-command.yml | Wires AI fallback into the backport workflow, loads the report/branch, updates gating, and applies an AI-specific label. |
shfmt -i 2 -ci -s prefers unquoted variable references inside [[ ]] since bash doesn't word-split them there.
The AI path uses --body-file $AI_REPORT_FILE, which replaces the PR body entirely. The earlier loop in create_pr.sh resolves/creates backport issues for each source-PR closing issue and collects them as 'Fixes: $url, ...'. Without those lines in the PR body, merging the AI-backport PR won't auto-close the backport issues this script just created. Append the Fixes: lines onto the AI report content in a temp file and use that as the --body-file input. Non-AI path is unchanged.
pr_details.sh previously wrote fixing_issue_urls to $GITHUB_OUTPUT only after a successful cherry-pick. When the cherry-pick fails (which is exactly when the AI fallback kicks in), backport_failure exits 1 before that write lands, so steps.pr_details.outputs.fixing_issue_urls is empty for the AI-path create_pr step and the backport PR ends up without Fixes: links. Move the echo to right after fixing_issue_urls is computed (after the graphql call) so both the non-AI and AI paths see the same value.
|
ready for human review |
dotnwat
left a comment
There was a problem hiding this comment.
do i understand correctly that this attempts to automate the resolution of backport cherry-pick conflicts?
if yes, then i'm not sure how i feel about it. we have an implicit (or maybe explicit) policy that if a backport doesn't conflict we don't require a review for merging. would this side step that? are backport conflicts common?
yes, this PR adds a fallback step if current step
note: this PR only adds ai assistance to manual |
Using
( v25.2.x is the noisy branch so far; v26.1.x is new enough that the sample is tiny. Rough signal: backport conflicts that a human has to pick up happen dozens of times per release line. |
Thanks. Sounds like a useful tool |
When
/backporttriggers a cherry-pick that conflicts, thetype-branchjob now hands off toanthropics/claude-code-action@v1running thecreate-backport-branchskill (already ondevvia #30248). If the skill resolves the conflict, the workflow opens a backport PR taggedai-resolved-conflictswith the skill's Markdown conflict report as the PR body. If the skill aborts (modify/delete or anything it can't confidently resolve), behaviour matches today: no backport PR, a fallback issue is opened, and the/backportcomment gets a 👎.The happy path is unchanged. When the plain cherry-pick succeeds, the AI step is skipped and no Anthropic API call is made — existing
/backportrequests that don't hit conflicts see no difference.DEVPROD-4091
Implementation notes
pr_detailsgetscontinue-on-error: trueso the job can fall through to the AI step on cherry-pick failure. The existingbackport_failurestill writesBACKPORT_ERRORto$GITHUB_ENVfor the fallback-issue body.Load AI skill reporttreats.ai-backport-meta/report.md's presence as the real success signal (the action exits 0 even when the skill intentionally aborts).GH_REPO=$TARGET_FULL_REPOis exported for the AI step so the skill'sgh api "repos/{owner}/{repo}/..."resolves toredpanda-data/redpandarather than the bot fork thatoriginpoints at.Failed reaction,Post Error,Create Issue On Error) are gated onalways() && pr_details.outcome == 'failure' && load_ai_handoff.outcome != 'success'— thealways()is mandatory becausecontinue-on-errorwould otherwise make GitHub treat the job as successful and skip these.create_pr.shhonours$AI_REPORT_FILEvia--body-filewhen set; otherwise uses the existing one-line--bodystring.ai-resolved-conflictsis pre-created on redpanda-data/redpanda.Cost
Each AI fallback invocation is an Anthropic API call — ~40k tokens observed on the test-migration validation runs. Happy-path backports spend nothing.
Backports Required
Release Notes
Test plan
Staged and validated end-to-end on redpanda-data/test-migration (see DEVPROD-4091 for links to the three scenario runs). Redpanda is the hot repo, so no pre-merge test; post-merge controlled verification plan:
/backport v25.3.xonmisc:use-after-movevlog fixes & removeNOLINTNEXTLINE#30227 (small use-after-move fix,merge-treepredicts 0 conflicts against all recent release branches). Expect:pr_detailssucceeds, AI step skipped, backport PR withkind/backportonly, no Anthropic cost./backport v25.3.xon kafka/protocol: bound parse_tags by remaining message bytes #30191 (kafka/protocol: bound parse_tags,merge-treepredicts 2 content conflicts intransport.ccandprotocol_utils.cc). Expect: AI step runs, backport PR opens onai-backport-pr-30191-v25.3.x-<ts>with labelskind/backport+ai-resolved-conflicts, PR body is the skill's Markdown report./backport v25.2.xon kafka/protocol: bound parse_tags by remaining message bytes #30191 (transport.ccis deleted on v25.2.x). Expect: AI aborts, fallback issue created, noai-backport-pr-*branch pushed tovbotbuildovich/redpanda, 👎 reaction on comment.Rollback: revert this commit. Label stays (harmless).
🤖 Generated with Claude Code